AITopics

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre:

Research Report > Experimental Study (0.95)
Research Report > Strength High (0.70)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Neural Information Processing SystemsDec-25-2025, 20:23:25 GMT

The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data

Causal inference is central to many areas of artificial intelligence, including complex reasoning, planning, knowledge-base construction, robotics, explanation, and fairness. An active community of researchers develops and enhances algorithms that learn causal models from data, and this work has produced a series of impressive technical advances. However, evaluation techniques for causal modeling algorithms have remained somewhat primitive, limiting what we can learn from experimental studies of algorithm performance, constraining the types of algorithms and model representations that researchers consider, and creating a gap between theory and practice. We argue for more frequent use of evaluation techniques that examine interventional measures rather than structural or observational measures, and that evaluate those measures on empirical data rather than synthetic data. We survey the current practice in evaluation and show that the techniques we recommend are rarely used in practice. We show that such techniques are feasible and that data sets are available to conduct such evaluations. We also show that these techniques produce substantially different results than using structural measures and synthetic data.

causal model, interventional measure and empirical data, name change, (5 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.60)

Amanda Gentzel, Dan Garant, David Jensen

The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data

Neural Information Processing SystemsOct-9-2025, 15:07:38 GMT

We survey the current practice in evaluation and show that the techniques we recommend are rarely used in practice.

algorithm, empirical data, evaluation, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Strength High (0.94)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Petelin, Gašper, Cenikj, Gjorgjina

The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong

arXiv.org Artificial IntelligenceMay-13-2025

Algorithm selection, aiming to identify the best algorithm for a given problem, plays a pivotal role in continuous black-box optimization. A common approach involves representing optimization functions using a set of features, which are then used to train a machine learning meta-model for selecting suitable algorithms. Various approaches have demonstrated the effectiveness of these algorithm selection meta-models. However, not all evaluation approaches are equally valid for assessing the performance of meta-models. We highlight methodological issues that frequently occur in the community and should be addressed when evaluating algorithm selection approaches. First, we identify flaws with the "leave-instance-out" evaluation technique. We show that non-informative features and meta-models can achieve high accuracy, which should not be the case with a well-designed evaluation framework. Second, we demonstrate that measuring the performance of optimization algorithms with metrics sensitive to the scale of the objective function requires careful consideration of how this impacts the construction of the meta-model, its predictions, and the model's error. Such metrics can falsely present overly optimistic performance assessments of the meta-models. This paper emphasizes the importance of careful evaluation, as loosely defined methodologies can mislead researchers, divert efforts, and introduce noise into the field

data mining, evolutionary algorithm, machine learning, (19 more...)

2505.0775

Country:

Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)
Europe > France > Brittany > Finistère > Brest (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Transportation (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Neural Information Processing SystemsOct-10-2024, 17:06:36 GMT

The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data

causal model, evaluation technique, interventional measure and empirical data, (2 more...)

Genre: Research Report (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.64)

arXiv.org Artificial IntelligenceApr-7-2024

Quantifying AI Vulnerabilities: A Synthesis of Complexity, Dynamical Systems, and Game Theory

Kereopa-Yorke, B

We propose a novel approach that introduces three metrics: System Complexity Index (SCI), Lyapunov Exponent for AI Stability (LEAIS), and Nash Equilibrium Robustness (NER). SCI quantifies the inherent complexity of an AI system, LEAIS captures its stability and sensitivity to perturbations, and NER evaluates its strategic robustness against adversarial manipulation. Through comparative analysis, we demonstrate the advantages of our framework over existing techniques. We discuss the theoretical and practical implications, potential applications, limitations, and future research directions. Our work contributes to the development of secure and trustworthy AI technologies by providing a holistic, theoretically grounded approach to AI security evaluation. As AI continues to advance, prioritising and advancing AI security through interdisciplinary collaboration is crucial to ensure its responsible deployment for the benefit of society.

ai system, complexity, robustness, (15 more...)

2404.10782

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Eisenreich, Tobias, Speth, Sandro, Wagner, Stefan

From Requirements to Architecture: An AI-Based Journey to Semi-Automatically Generate Software Architectures

arXiv.org Artificial IntelligenceJan-25-2024

Designing domain models and software architectures represents a significant challenge in software development, as the resulting architectures play a vital role in fulfilling the system's quality of service. Due to time pressure, architects often model only one architecture based on their known limited domain understanding, patterns, and experience instead of thoroughly analyzing the domain and evaluating multiple candidates, selecting the best fitting. Existing approaches try to generate domain models based on requirements, but still require time-consuming manual effort to achieve good results. Therefore, in this vision paper, we propose a method to generate software architecture candidates semi-automatically based on requirements using artificial intelligence techniques. We further envision an automatic evaluation and trade-off analysis of the generated architecture candidates using, e.g., the architecture trade-off analysis method combined with large language models and quantitative analyses. To evaluate this approach, we aim to analyze the quality of the generated architecture models and the efficiency and effectiveness of our proposed process by conducting qualitative studies.

architecture, domain model, requirement, (12 more...)

doi: 10.1145/3643660.3643942

2401.14079

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.06)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre:

Overview (0.47)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Jadon, Aryan, Patil, Avinash

A Comprehensive Survey of Evaluation Techniques for Recommendation Systems

arXiv.org Artificial IntelligenceJan-12-2024

The effectiveness of recommendation systems is pivotal to user engagement and satisfaction in online platforms. As these recommendation systems increasingly influence user choices, their evaluation transcends mere technical performance and becomes central to business success. This paper addresses the multifaceted nature of recommendations system evaluation by introducing a comprehensive suite of metrics, each tailored to capture a distinct aspect of system performance. We discuss * Similarity Metrics: to quantify the precision of content-based filtering mechanisms and assess the accuracy of collaborative filtering techniques. * Candidate Generation Metrics: to evaluate how effectively the system identifies a broad yet relevant range of items. * Predictive Metrics: to assess the accuracy of forecasted user preferences. * Ranking Metrics: to evaluate the effectiveness of the order in which recommendations are presented. * Business Metrics: to align the performance of the recommendation system with economic objectives. Our approach emphasizes the contextual application of these metrics and their interdependencies. In this paper, we identify the strengths and limitations of current evaluation practices and highlight the nuanced trade-offs that emerge when optimizing recommendation systems across different metrics. The paper concludes by proposing a framework for selecting and interpreting these metrics to not only improve system performance but also to advance business goals. This work is to aid researchers and practitioners in critically assessing recommendation systems and fosters the development of more nuanced, effective, and economically viable personalization strategies. Our code is available at GitHub - https://github.com/aryan-jadon/Evaluation-Metrics-for-Recommendation-Systems.

metric, precision, recommendation system, (13 more...)

2312.16015

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland > Greater Poland Province > Poznań (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

arXiv.org Artificial IntelligenceJan-12-2022

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

Smith, Eric Michael, Hsu, Orion, Qian, Rebecca, Roller, Stephen, Boureau, Y-Lan, Weston, Jason

At the heart of improving conversational AI is the open problem of how to evaluate conversations. Issues with automatic metrics are well known (Liu et al., 2016, arXiv:1603.08023), with human evaluations still considered the gold standard. Unfortunately, how to perform human evaluations is also an open problem: differing data collection methods have varying levels of human agreement and statistical sensitivity, resulting in differing amounts of human annotation hours and labor costs. In this work we compare five different crowdworker-based human evaluation methods and find that different methods are best depending on the types of models compared, with no clear winner across the board. While this highlights the open problems in the area, our analysis leads to advice of when to use which one, and possible future directions.

evaluation, evaluation technique, win rate, (14 more...)

2201.04723

Country:

North America > United States > Ohio (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.67)

arXiv.org Artificial IntelligenceNov-9-2021

An effective hybrid search algorithm for the multiple traveling repairman problem with profits

Ren, Jintong, Hao, Jin-Kao, Wu, Feng, Fu, Zhang-Hua

As an extension of the traveling repairman problem with profits, the multiple traveling repairman problem with profits consists of multiple repairmen who visit a subset of all customers to maximize the revenues collected through the visited customers. To solve this challenging problem, an effective hybrid search algorithm based on the memetic algorithm framework is proposed. It integrates two distinguished features: a dedicated arc-based crossover to generate high-quality offspring solutions and a fast evaluation technique to reduce the complexity of exploring the classical neighborhoods. We show the competitiveness of the algorithm on 470 benchmark instances compared to the leading reference algorithms and report new best records for 137 instances as well as equal best results for other 330 instances. We investigate the importance of the key search components for the algorithm.

algorithm, customer, neighborhood, (15 more...)

2111.05017

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > France (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)